home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Resource Library: Multimedia
/
Resource Library: Multimedia.iso
/
hypertxt
/
msdos
/
hypdiss
/
app-d
< prev
next >
Wrap
Text File
|
1992-04-02
|
29KB
|
633 lines
APPENDIX D.
HYPERTEXT INFORMATION ACCESS STUDY
INTERVIEW SUMMARY
NEIL LARSON
BERKELEY, CAL.
MARCH 11, 1991
A. HYPERTEXT ARCHIVE TRANSACTION/SUPPORT SYSTEM:
A.1. Please summarize the basic hypertext document content
assembly & maintenance procedures.
The database consists of accounting and auditing
information converted to a hypertext format for
Deloitte & Touche. A Deloitte & Touche group has
defined the strategic plan for the database. This
includes content, maintenance, plans for expansion.
Deloitte & Touche takes care of providing all material
for the database.
** The information arrives almost entirely in hard
copy printed form, and must be converted to electronic
format. The current operation uses a Kurzweil OCR
unit for conversion processing, and can convert
approximately 500 pages per day, if needed, with two
people working. One person handles physical
processing of the OCR unit; the other handles spell-
checking, error correction, and initial text
formatting.
** Document inspection, analysis, and converting to
screen format suitable for hypertext is the next step.
The major task is breaking a linear document into
separate hierarchical sections. This includes several
subtasks. First, the screen format is adapted for
best display and user comprehension. The document
must be divided into multiple short sections,
attempting to form logical text units or hypertext
nodes covering a single topic, within a preferred
maximum of one screen length. As source text is split
up into hypertext nodes, the author embeds links to
other relevant sections, and "continuity links" to
previous and next sections.
** The author must also insert links which fit the
total document into the system's overall conceptual
hierarchy. He is continually revising and redefining
that conceptual hierarchy.
MaxThink hypertext links are designated in the form of
the target MS-DOS filename surrounded by angle bracket
characters. This link convention is case-insensitive.
E.g., they can be in either following form:
<filename>
<FILENAME>.
A.2. GENERAL PRESENTATION AND PRODUCTION DESIGN OF THE
HYPERTEXT ACCESS SYSTEM:
A.2.a. Describe the general arrangement of the main
document file. (Unique document identification,
general logical arrangement, basic principle of
access)
MaxThink elected usage of straight MS-DOS ASCII text
files for basic node text storage. The text files are
directly accessible by the combination of subdirectory
name and file name.
MS-DOS file retrieval performance seriously degrades
if there are substantially more than 100 files in a
disk subdirectory. They solved this limitation by
using a system of hierarchical or specialized
subdirectories, limiting each to approximately 100
files. They use a general classification approach in
assigning files to subdirectories. [NOTE:
Subdirectory approach is also covered in the Phillips
interview notes.]
They use file-naming conventions to produce unique
text filenames. These standardized names may reflect
a combination of factors, such as source of
information, document type, time/date of publication,
source file section, etc. The conventions are
generally mnemonic, so users easily learn the coding,
and can predict file content.
A.2.b. Please summarize the general concepts of the
system's "user interface," the document access and
display methods, design of the presentation means,
etc.
Larson says the design goal was to arrange information
in a clear, simple, method, so that people can find
it. They developed a hypertext presentation mechanism
which they feel is intuitively obvious. They also
attempted to design powerful hierarchy and indexing
approaches, so the material would be accessible from
many different viewpoints.
The interface design is based almost entirely upon use
of the four cursor arrow keys. The arrows are a
metaphor for "jumping" to another location in the
information base. The up and down cursor arrows
select from links displayed on the screen; the right
cursor arrow executes the jump; the left cursor arrow
backtracks to the origin or "jump-off" location.
Larson says, "This hypertext navigation metaphor is so
simple that it takes a user about 30 seconds to learn.
It is complemented by providing an effective
hierarchical system of networked menus, in combination
with an indexing system." Both approaches use
"embedded menus." These feature obvious, eye-
readable, hypertext links, used along with a clear and
obvious menu structure, or descriptive surrounding
text.
He goes on, "Our menus attempt to build a conceptual
structure of the topic. We use metaphors to express
the thought patterns or structures relating to the
topic. We intend to express the domain structure with
such memorable, obvious, metaphors, that users will
adopt the structure; that it becomes their structure."
A.2.c. Identify and briefly describe the general
production tools or building tools used in
construction of the system.
Larson describes their approach to hypertext
construction as generally building the system out of
nodes or fragments of information, which have been
"decomposed" from original printed documents. He
notes the necessity for identifying the information
content in the nodes, and linking or sequencing them
into a meaningful, communicative, knowledge structure.
He describes the use three major tools for building
the hypertext system. They include an editor, used
for formatting and insertion of links; an outliner,
used to form hypertext hierarchies; and a matrix
outliner, or network builder, used to create complex
hypertext networks. (More fully described in the next
section.)
He feels that these three tools give them the ability
to construct three powerful and complementary
approaches. He describes these as:
* Taxonomic approach - using hierarchies
* Linguistic approach - using the glossary index
* Hypertext network - using the complex
interconnected networks."
He also mentions the use of various utility programs,
described in next section.
A.2.d. Identify and briefly describe the specialized
organizational and quality control tools which allow
you to build the system.
** "TransText" - the hypertext word processor. They feel
this editor to be the most important tool. It is used
for formatting, editing, "splitting up" or breaking
the file into nodes, and for insertion of hypertext
links. It thus handles both transformation of the
file information into effective, communicative,
display format, as well as the insertion of the links
themselves.
** "MaxThink" - outliner, used as the major
hierarchical tool. It can create classes, sequence,
boundaries, and hierarchies (with inheritance). It is
used to create logical structures or metaphors of the
information domain, which can automatically generate
hypertext hierarchies.
** "Houdini" - network-building tool. This program
is a matrix outliner, and can build "3-dimensional"
outlines, where any node can be connected to any other
node. These networks can also interconnect to and
within other networks. Again, the Houdini matrix
networks can automatically generate hypertext
networks. The network headings also generate a KWOC
"glossary" index, which is always instantly available
to the user.
Larson pointed out that they also use a number of
specialized utility programs, for specialized editing
and control functions. Some examples are:
** REFALL - shows all hypertext jumps FROM a file.
Good for analyzing patterns of hypertext linkage.
** INVERT - shows all hypertext jumps TO a file.
Good for analyzing patterns of hypertext linkage.
** CONNECT - shows all generations of input and
output links to a group of specified files. Good for
analyzing patterns of hypertext linkage.
** LINE - creates a <linked> list of all hypertext
source text nodes, including title line or descriptive
first line of text. List can be used with the
TransText editor, or imported into the MaxThink
outliner or Houdini matrix outliner. Good for
identification and network incorporation of text
content nodes.
** IC - (Integrity Checker) used to check for blind
references to non-existent files, for link name
errors.
** Glossary building utilities - produces an "online
index" to network nodes and file titles, presented in
KWOC format. Exercises depluralization, synonym
control, and sorting of index entries by source
document type.
B. THE HYPERTEXT INFORMATION ACCESS SYSTEM:
B.1. ACCESS POINTS - Which of the following types of
access points are included in your system?
For each question item, please rate using the following
categories, and comments as needed...
P)resent,E)asily achievable,M)odifications needed,N)ot
achievable
B.1.a. Main file sequence - direct file access
Category: [P] E M N
Hypertext nodes retrievable by ASCII file name.
B.1.b. Author
Category: [P] E M N
Editorial decision. Author indexing is included in
DaTa, in many instances.
B.1.c. Title
Category: [P] E M N
Editorial decision. Included in DaTa, in many cases.
B.1.d. Name forms
Category: [P] E M N
Editorial decision. Optionally included.
B.1.d.i. Personal names
Category: [P] E M N
Editorial decision. Optionally included.
B.1.d.ii. Corporate names (Companies, organizations,
government, etc.)
Category: [P] E M N
Editorial decision. Optionally included.
B.1.e. Keywords
Category: [P] E M N
Keyword access through "Glossary" KWOC index.
B.1.f. Subject/Topic/Concept
Category: [P] E M N
Via hierarchy, network, and KWOC index.
B.1.g. Geographic
Category: [P] E M N
Editorial decision. Optionally included. Present in
DaTa as part of hierarchy.
B.1.h. Date, chronological, temporal
Category: [P] E M N
Editorial decision. Optionally included. Present in
DaTa as part of hierarchy, as well as in filename
conventions.
B.1.i. Language
Category: [P] E M N
This is purely an editorial decision, the capability
is present. Minor software modifications may be
needed, to handle ASCII extended character set for
foreign languages.
B.1.j. Document format - book, article, pamphlet, report,
etc.
Category: P [E] M N
Editorial decision. Optionally included.
B.1.k. Document position - section, page, location
Category: [P] E M N
Editorial decision. Can optionally be included as
part of hierarchy. This would be labor-intensive. It
would be most efficient to add this as a link call to
an external searching program, with the ability to
handle positional or string specifications.
B.1.l. Automated field specifications - record size, entry
date, notations, originator, etc.
Category: [P] E M N
MaxThink utilities include a string-searching program,
callable from embedded hypertext link. The hypertext
links can similarly call any external DOS program.
[The investigator, for example, has built a system
with link calls to the Zyindex text search & retrieval
program. The Zyindex index file allowed full-text
search of the entire hypertext database, in addition
to regular hypertext links.]
B.2 ACCESS APPROACHES - Which of the following subject or
topical information devices are used in your system?
For each question item, please rate using the following
categories, and comments as needed...
P)resent,E)asily achievable,M)odifications needed,N)ot
achievable
B.2.a. Classification schemes
B.2.a.i. Hierarchical taxonomy
Category: [P] E M N
Yes, we view the generated hierarchy and linked
network as a classification scheme, more flexible and
powerful than the standard linear taxonomy.
B.2.a.ii. Enumerative, universal, classification [Dewey
type classification]
Category: P [E] M N
Editorial decision. Optionally included. Any
classification can be embedded or expressed in the
hypertext hierarchy.
B.2.a.iii. Specialized, literary warrant, classification
[Library of Congress, Reader Interest Classification]
Category: P [E] M N
Editorial decision. Optionally included. Any
classification can be embedded or expressed in the
hypertext hierarchy.
B.2.a.iv. Faceted classification (analytico-synthetic)
[PRECIS style of indexing] [C., p.65]
Category: P [E] M N
Editorial decision. Optionally included. Any
classification can be embedded or expressed in the
hypertext hierarchy.
B.2.b. Indexing approaches
B.2.b.i. Alphabetical index, separate or dictionary file
Category: [P] E M N
Present.
B.2.b.i.A. Keywords, extracted or assigned
Category: [P] E M N
Have utilities for term extraction, will be developing
further. the KWOC index utility rotates assigned
network headings or file title words.
B.2.b.i.B. Controlled vocabulary assignment
Category: P [E] M N
Editorial decision. Optionally included. At present,
the KWOC index utility optionally rotates either
assigned network headings or file title phrases.
B.2.b.i.C Relative index, e.g., to Dewey classification
Category: P [E] M N
Editorial decision. Optionally included via taxonomy.
B.2.b.ii. Term manipulation indexes (generally for
production of printed output)
Category: [P] E M N
An integral part of the system.
B.2.b.ii.A. Simple permuted or rotated - KWIC
Category: P [E] M N
Editorial decision. Optionally included.
B.2.b.ii.B. Ordered by extracted element - KWOC
Category: [P] E M N
An integral part of the system.
B.2.b.ii.C. String indexing (phrase-manipulation, rotation
of terms) - PRECIS, NEPHIS, etc.
Category: P [E] M N
Editorial decision. Optionally included. Achievable
by creating index with external utility, then
importing into taxonomy form.
B.2.b.ii.D. Chain indexing (string indexing, with forms
reflecting basic taxonomy of terms [C., p. 67]
Category: P [E] M N
Editorial decision. Optionally included. Achievable
by creating index with external utility, then
importing into taxonomy form.
B.2.b.iii. Classified index (generally requires secondary
alphabetical index, for ease of use) [C., p. 56]
Category: P [E] M N
Editorial decision. Optionally included. Achievable
by creating index with external utility, then
importing into taxonomy form.
B.2.b.iv. Coordinate indexing - Manual coordination or
automated database file, using Boolean search [C., p.
60]
Category: P [E] M N
Editorial decision. Optionally included. Achievable
by call to external program.
B.2.b.iv.A. Older non-automated searching methods -
peekaboo, edge-notched cards, Uniterm terminal digit
cards
Category: P E M [N]
Not applicable. This system does not use a hard copy
format file record.
B.2.b.iv.B. Database file search - Sequential or indexed
field search
Category: P [E] M N
Editorial decision. Optionally included. Achievable
by call to external program.
B.2.b.iv.C. Full text search
Category: [P] E M N
During the interview, and elsewhere, Larson voices
strong subjective disapproval of this information
retrieval approach (Fersko-Weiss 1991). Nevertheless,
MaxThink provides SEARCH and CD-INDEX, two program
modules which provide this option. This is an
editorial decision; the text-searching feature may be
optionally included. The hypertext links can also
call other, more powerful, string-searching programs.
An example is National Legal Research Systems' Qwik-
Rules (TM) legal rules hypertext information system.
They used MaxThink hypertext software to build the
system, and provide links to QWIKFIND, their own text-
searching engine. As elsewhere mentioned, the
investigator himself has also built systems with link
calls to Zyindex, Golden Retriever, Power Search, and
other text-searching programs.
B.2.b.v. Faceted indexing [C., p 65]
Category: P [E] M N
Editorial decision. Optionally included. Achievable
by creating index with external utility, then
importing into taxonomy form.
B.2.b.vi. Citation indexing [C., p. 72]
Category: P [E] M N
Editorial decision. Optionally included. Achievable
by creating index with external utility, then
importing into taxonomy form.
B.3. CONTROL MECHANISMS - Which of the following subject
access control measures, intended to control
consistency, form, and item sequencing, are present in
your system?
For each question item, please rate using the following
categories, and comments as needed...
P)resent,E)asily achievable,M)odifications needed,N)ot
achievable
B.3.a. Classification schedule
Category: [P] E M N
The hierarchical taxonomy is equivalent to a flexible
classification schedule, in our opinion.
B.3.b. Vocabulary control systems
Category: [P] E M N
Editorial decision. Optionally included. Our
Glossary utility presently uses controls on form of
entry, e.g., depluralization, (singular preferred),
synonym cross-references, stopword lists for the KWOC
index, automatic sorting by entry type. We are also
considering automatic word-stemming for the KWOC
index.
B.3.b.i. Authority/Headings files
Category: P [E] M N
Editorial decision. Optionally included. Achievable
by external manual or automated means.
B.3.b.ii. Thesaurus control
Category: P [E] M N
Editorial decision. Optionally included. Achievable
by external manual or automated means.
B.3.b.iii. Derived-term methods or algorithms
Category: P [E] M N
The DaTa operation already uses term extraction
utilities for analyzing files and groups of files.
MaxThink is considering developing more advanced term
extraction utilities, based on word frequency, per
Miranda Pao. This could also be achieved by using
third-party software for index term extraction.
B.3.b.iv. Hierarchical search thesaurus (for database file
search)
Category: P E M [N]
This approach is not currently used, nor realistic,
since the primary approach is not a "searching"
methodology. If editorial decision mandates, authors
could achieve this via link call to external searching
program with this capability. E.g., Zyindex,
MicroBASIS.
B.3.b.v. Entry term form control mechanisms
Category: [P] E M N
Editorial decision. Optionally included. Achievable
externally, using manual or automated means.
B.3.b.v.A. Entry syntax (preferred noun/adjective, etc.,
construction form)
Category: [P] E M N
Present approach entirely a matter of editorial policy
control. E.g., the DaTa CD-ROM product operates with
preferred usages.
B.3.b.v.B. Standard number approach (plural, singular
form preference)
Category: [P] E M N
Present DaTa approach uses singular-preferred, uses
depluralization in the glossary KWOC utility.
B.3.b.v.C. Automatic depluralization (database file)
Category: P [E] M N
Not applicable using the associative linking approach.
Depluralization can be implemented in hypertext index
representations. The present DaTa approach uses
singular-preferred, uses depluralization in the
glossary KWOC utility. As an alternative, an author
can also use links to external database software with
this capability
B.3.b.v.D. Synonym definition (database file)
Category: [P] E M N
This is an editorial decision. The KWOC glossary
utility program includes automatic synonym handling,
cross-references, etc., for construction of the KWOC
index.
B.3.c. "Standard Subdivision" or faceted classification
protocol
Category: [P] E M N
Use standard extensions in filename conventions for
document types; also use standard coding to reflect
document types in network/glossary files. This also
results in sorting by document or node type in the
KWOC index.
B.3.d. Term or descriptor relationships - Roles, links,
weighting
Category: P [E] M N
Not currently used, nor realistic, since the primary
approach is not a "searching" methodology. If
editorial decision mandated, could achieve by link
call to external searching programs with this
capability.
B.3.e. Filing or sorting rules
Category: [P] E M N
For convenience, they currently use straight ASCII
sort for the KWOC index, with sub-sorts by document or
node type. The network taxonomy certainly reflects a
subjective, author-imposed, ordering or hierarchy.
Any other sorting sequence for the KWOC could be
supported with the correct algorithm for the external
sorting utility.
B.3.f. Manual or automated authority/procedural safety
measures
Category: [P] E M N
Full set of utilities, described above, for checking
linking patterns, clustering, link name spelling
errors, blind references, file text contents, etc.
In addition, the production team uses full normal
computer operating approaches to backup files, off-
site copies, working copy backups, etc.